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Abstract — Rank estimation is a classical model order selection 
problem that arises in a variety of important statistical signal and 
array processing systems, yet is addressed relatively infrequently 
in the extant literature. Here we present sample covariance 
asymptotics stemming from random matrix theory, and bring 
them to bear on the problem of optimal rank estimation in the 
context of the standard array observation model with additive 
white Gaussian noise. The most significant of these results demon- 
strates the existence of a phase transition threshold, below which 
eigenvalues and associated eigenvectors of the sample covariance 
fail to provide any information on population eigenvalues. We 
then develop a decision-theoretic rank estimation framework that 
leads to a simple ordered selection rule based on thresholding; in 
contrast to competing approaches, however, it admits asymptotic 
minimax optimality and is free of tuning parameters. We analyze 
the asymptotic performance of our rank selection procedure and 
conclude with a brief simulation study demonstrating its practical 
efficacy in the context of subspace tracking. 

Index Terms — Adaptive beamforming, array processing, ran- 
dom matrix theory, sample covariance matrix, subspace tracking. 



observation model with additive white Gaussian noise, and in 



Section III we present sample covariance asymptotics based on 



random matrix theory. In Section IV we bring these results to 



bear on the problem of optimal rank estimation by developing 
a decision-theoretic rank estimation framework, and associated 
algorithm whose asymptotic minimax optimality we prove. We 
then provide a brief simulation study in Section |V] to demon- 
strate the practical efficacy of our rank selection procedure, 



and conclude with a summary discussion in Section VI 



II. Problem Formulation: Model Order Selection 

Suppose at time t we observe data vector x{t) e C" (or 
E"), a weighted combination of r signal vectors corrupted by 
additive noise. Using the notation Air to denote an additive 
observation model of order r, we can express this as 

r 

Mr : x{t) = s,{t)ai + n{t) = As{t) + n{t), (1) 



I. Introduction 

RANK estimation is a important model order selection 
problem that arises in a variety of critical engineering 
applications, most notably those associated with statistical 
signal and array processing. Adaptive beamforming and sub- 
space tracking provide two canonical examples in which one 
typically assumes ri-dimensional observations that linearly 
decompose into a "signal" subspace of dimension r ^ n 
and complementary "noise" subspace of dimension n — r. In 
many applications the goal is to enhance, null, or track certain 
elements of the signal subspace, based on observed array data. 

In this context, we of course recover the classical statistical 
trade-offs between goodness of fit and model order; i.e., 
between system performance and complexity. In the rank 
selection case these trade-offs are particularly clear and com- 
pelling, as subspace rank estimation may well correspond to 
correctly identifying the number of interferers or signals of 
interest. We note that the goal of this article is a general 
understanding of the model order selection problem in this 
context, rather than an exhaustive application-specific solution. 

To this end, we present here a collection of recent results 
from random matrix theory, and show how they enable a 
theoretical analysis of this instantiation of the model order 
selection problem. In Section [II] we formulate the problem 
of model order selection in the context of the standard array 
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where A = (oj^ ' ' ' a^) is an n x r mixing matrix and 

sit) = {si{t), S2{t), . . . , Sr{t)) an r X 1 signal vector Here, 
we assume that A is a deterministic matrix of weights and s{t) 
is a random source vector with nonsingular covariance matrix 
Cs. If the noise n{t) is independent of the source and white, 
with all components having variance a'^, then the covariance 
of x{t) is given by 



C = E [x{t)x{t)*] ^ ACsA* + (7^1. 



(2) 



In most array processing applications, it is generally desired 
to estimate or determine A based on "snapshot" data vectors 
x{t); however, note that the decomposition in (|2]l is not 
statistically identifiable. We can render it identifiable up to 
a sign change by reparametrizing as 



C = WAW* 



(3) 



with W = (wi 



an n X r matrix having 



orthonormal columns and A = diag(Ai, A2, . . . , A^) a di- 
agonal matrix with Ai > A2 > • • • > A^ > 0. Then, in 
many applications we can recover the parameters of interest 
from W using specialized knowledge of the structure of A in 
conjunction with well-known algorithmic approaches such as 
MUSIC |1| or ESPRIT |2|. 

In general, then, the goal of subspace tracking is to estimate 
W as well as possible from observed array data x. Often, the 
associated data covariance C will change in time; signals may 
change their physical characteristics or direction of arrival, 
with some ceasing while others appear Such situations require 
adaptive estimation of W. Over the past twenty years, a 
variety of algorithms have been developed for recursively 
updating estimates of the dominant subspace of a sample co- 
variance matrix. Projection Approximation Subspace Tracking 
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(PAST) [3] is among the most popular, though many new 
algorithms continue to be proposed (see, e.g., |j4|-||6j). 

A deficiency of nearly all of these algorithms is that they 
require prior knowledge of r, the rank of the desired signal 
subspace. In relative terms, little attention has been devoted 
in the extant literature to estimating r in an optimal manner. 
Kavcic and Yang (7), Rabideau |8], and Real et al. |j9| 
separately address the problem, but ultimately all rely on 
selection rules with problem-specific tuning parameters. More 
recently, Shi et al. pO) suggest a modification of ||8); each of 
these approaches seeks to derive appropriate tuning parameters 
based on user-specified false alarm rates. 

In parallel to the above developments, the literature on 
random matrix theory has progressed substantially over the 
past ten years. An important outgrowth of this theory is 
a new set of rank-selection tools. To this end, Owen and 
Perry fTV\ suggest a cross-validation-based approach that, for 
computational reasons, does not appear immediately applica- 



ble to real-time subspace tracking. Kritchman and Nadler ( 12| 
provide a survey of other approaches, including those based 
on information-theoretic criteria (13] , [14] , ultimately recom- 
mending a method based on eigenvalue thresholding. Their 
threshold is determined from a specified false alarm rate using 
the theory developed in |15J-|20|. 

Even for the rank selection procedures with interpretable 
tuning parameters such as false alarm rate, it is not clear 
how these parameters should be chosen. In the sequel, we 
summarize recent sample covariance asymptotics and employ 
them to develop a decision-theoretic framework for estimation 
with specified costs for over- or under-estimating the rank. 
Within this framework we derive a selection rule based on 
eigenvalue thresholding, without the need for tuning parame- 
ters, that minimizes the maximum risk under a set of suitable 
alternate models. 

III. Sample Covariance Asymptotics 



the rth-order signal-plus-noise observation 
x{t) = As{t) + n{t) of ([TJ gives rise to the 



Recall that 
model M. 

covariance form C defined in (|2]) and ([3]). The rank selection 
rule we derive is based on a sample covariance matrix C 
comprised of N array snapshots x{t), indexed as a function 
of time t. For N < t, this empirical covariance estimator is 
defined as 



k)*, 



(4) 



fc=0 



and our appeal to random matrix theory will rely on properties 
of C as the number of snapshots N becomes large. 

As is customary in the case of random matrix theory, we 
work in an asymptotic setting with N and n both tending to 
infinity, and their ratio n/N tending to a constant 7 e (0, 00). 
Moreover, we suppose that the number of signals r is fixed 
with respect to this asymptotic setting, although it is likely 
that this assumption can in fact be relaxed to r = o(i/n). To 
simplify the presentation of the theory, we also suppose strict 
inequality in the ordering Ai > A2 > • • • > A,. > with 
respect to the decomposition of ([3]l of the true covariance 




Fig. 1. Density of the Tracy-Widom law Fp(x) for real (/3 = 1) and com plex 
(13 = 2) cases, computed using the online software packages (21|, (22). 



C. Note that under the assumed model of ([T]i, the actual 
eigenvalues of C are given by {Ai + U {c^liLr+i- 

To begin, we define the eigendecomposition of the empirical 
covariance C of (Hll as 



C = WLW*, 



(5) 



where W = 
and L 



7^2 • • • Wjj) has orthonormal columns 
£2, ■ . ■,(„) has £1 > £2 > ■■■ > in > 
0. Now consider the additive observation model Aio of ([T]) 
corresponding to r = 0; i.e., in the absence of signal: 



diag(^i 



Mo ■ x{t) = ^ s^{t)a,^ + n{t), 



0. 



This Oth-order case defines a natural null model for our 
rank estimation problem, in that x{t) — n{t), and hence the 
observed snapshots x{t) that comprise C will consist entirely 
of noise. In the setting where x{t) is white Gaussian noise, 
Johansson p3) derives the distribution of £1, the principal 
eigenvalue of C, for complex-valued data, and Johnstone |16| 
gives the corresponding distribution in the real-valued case. 
These results are defined in terms of the density function of 
the celebrated Tracy-Widom law (illustrated in Fig. [TJ and 
imply the following theorem. 

Theorem 1 (Asymptotic Null Distribution): Let ^1 > be 
the largest eigenvalue of an n-dimensional sample covariance 
matrix C comprised of N i.i.d. observations x according 
to Q, where each vector x has i.i.d. Normal(0, ct^) entries. 
Defining the standardizing quantities 



N V 
N V 



/N 



1/3 
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< X 



we obtain as n,N oo, with n/N — > 7, the resuh 

where Fpi^x) is the distribution function for the Tracy-Widom 
law of order /3, with /3 = 1 if C e K" and S = 2 if C G C". 

Remark 1: For complex C, El Karoui |23| obtains a con- 
vergence rate of order (nATV)^/^ through small modifications 
of iJ,N,n. and (7N,n', Ma |j24| treats the case of real C similarly. 

Next, we consider properties of the empirical covariance 
estimator C when at least one signal is present, by way of the 
following sequence of nested alternate models: 

jTW^: = ^s,(i)aj + , < r < n. (6) 

In this setting, Baik et al. p7) discovered a phase-transition 
phenomenon in the size of the population eigenvalue. This 
work was further developed by Baik and Silversten p8) , 
Paul p9) , and Onatski pO| . Paul derived the limiting distri- 
bution for the case C € M", with the ratio n/N of sensors to 
snapshots tending to 7 < 1. Onatski later generalized this to 
7 S (0,cx)). Finally, Bai and Yao |25| derived these limiting 
distributions in the complex case. A simplified summary of 
the above work is given by the following theorem. 

Theorem 2 (Asymptotic Alternate Distribution): Consider 
a covariance C = E[x{t)x{t)*] under the model of (|6|, 
with r > distinct principal eigenvalues {A^ + a^}, and the 
corresponding sample covariance matrix C, with n ordered 
eigenvalues {£{}. Denoting by $(•) the standard Normal 
distribution function, and defining the standardizing quantities 

we have that if Ai, A2 

MAf,n(Ai) 



I Null 

I Alternate 



, Xq > -^70-^, then 



crN,n{Xi) 



< Xi 



,9 



i=l 



Otherwise, for any A^ : Ai < ^/ja'^, then £i (l + ^/j^- 

As in Theorem [T] /? = 1 for real data and 2 for complex data. 

Remark 2: Theorem |2] yields a critical threshold 
below which any population eigenvalue is unrelated to its 
corresponding sample eigenvalue; sample eigenvalues cor- 
responding to population eigenvalues above this threshold 
converge to a multivariate Normal with diagonal covariance. 

Paul and Onatski also give accompanying results linking the 
mutual information of population and corresponding sample 
eigenvecfori through the same critical threshold -y/T'cr^, and 
moreover implying the general inconsistency of the latter as 
estimators of the former. For the case of real-valued data, they 
prove the following theorem. 

Theorem 3 (Sample Eigenvector Inconsistency): Let 

denote the ith principal population eigenvector 
of E [x{t)x{t)'^'\ under the model of (|6]), and its 
corresponding sample version via (|5]). Then we have that 




if i — j and > y^a 
otherwise. 




Fig. 2. Example illustrating agreement of empirical and asymptotic distribu- 
tions for the null and alternate settings of Theorems[T]and|2] respectively. The 
latter case comprises a single signal eigenvalue A set at a signal-to-noise ratio 
of 3 dB, with all other parameter settings matched to the simulation study of 
Section |V] (complex- valued data with n = 9 and N = 45). 



Remark 3: Onatski gives a convergence rate of V iV for the 
quantities of Theorem [5] 

To conclude this section, we note that while the above 
results are asymptotic in nature, evidence suggests that they 
are achieved in practice for small sample sizes. In particular. 
Ma 1 24 1 has catalogued empirical convergence rates for Theo- 
rem [T] demonstrating that for n ranging up to 500, even with 
only N = 5 samples the Tracy-Widom asymptotics remain 
a good approximation in the upper tail of the distribution — 
the setting of interest in the model selection problem posed 
here. In later simulations, we apply our results to the model 
selection regime of direction-of-arrival estimation |[7|, and 
consider complex-valued data in n = 9 dimensions using 
iV = 45 snapshots. Figure |2] shows a comparison of empirical 
and asymptotic distributions in this scenario, with the generally 
good agreement providing further evidence for the practical 
utility of Theorems [T] and [2] above. 

IV. Minimax-Optimal Rank Estimation 

The previous section has given us a relatively complete 
description of the behavior of C, our n-dimensional sample 
covariance comprised of N array snapshots, with n/N tending 
to 7 e (0, 00) as ri, — > 00, and the variance of additive 
white Gaussian noise in the observation model Air of Q. 
With this information, we are ready to proceed to the task of 
estimating the model order r, corresponding to the number 
of signals present. In light of Theorems |2] and |3] we need 
not consider the rth signal if its strength is below the critical 
threshold ^cP', as the corresponding alternate model A^,. 
will typically be indistinguishable from the null M.r-\ in the 
asymptotic limit. In the sequel we thus restrict our attention 
to the case when A^ > -^70-^ for i = 1,2, . . . ,r. 

A. Derivation of Asymptotic Risk for Signal Absence/Presence 

To formulate our minimax-optimal rank estimation task, we 
adopt a classical decision-theoretic approach: we first define a 
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loss function L to measure the quality of a particular estimate 
of r, and then derive a decision rule 5 that minimizes the 
risk i? = E [i(i5)]; i.e., the expected loss under our assumed 
probability model. 

Consider first the most basic problem to which Theorems [T] 
and |2] offer a solution: differentiating between observing no 
signal at all (r — 0) and observing a single signal (r = 1). 
When r = 0, the snapshot x{t) is assumed to be a zero-mean 
multivariate Normal with covariance a^I. When the model 
order r — \, the population covariance matrix C has one 
eigenvalue equal to A + cr^, and the rest equal to a^. We 
encode these models by noting that A = in the first and 
A > in the second, and next address the task of choosing 
between them according to the sample covariance C. 

Using 5 to denote a decision rule taking values in {0, 1}, we 
let (5 = encode the decision r = 0. To this rule, we assign 
an "inclusion" penalty ci > for incorrectly overestimating 
the rank, and an "exclusion" penalty ce > for incorrectly 
underestimating it; when 5 chooses the correct outcome we 
assign no penalty. We summarize this by introducing the loss 
function L(A,(5), defined as 



when A = and 5 = 1, 
when A > and (5 = 0, 
otherwise. 



L{\5) 



Guided by the results of Sectio n [HI] we distinguish between 
the two cases above based on li{C), the principal eigenvalue 
of the observed sample covariance matrix C: If £i is larger 
than some fixed threshold, we estimate r as f = 1, and 
otherwise we choose f = 0. For a threshold T, we thus define 
our decision rule 8 as 



5T{t) 



1 if £ > T, 
otherwise. 



(7) 



The risk associated with this rule is given by evaluating the 
expected loss E [L(A, (5y(^i))] associated with our chosen test 
statistic li{C), with respect to probabilities P^. under the 
two competing models A^o and A^i: 



R{\5t) 



ci-VmA^i>T} when a = 0, 
Ce - ^mA^i < T} otherwise. 



Theorem [T| in turn describes the asymptotic distribution of ii 
when r = 0, while Theorem |2] describes it when r = 1. We 
thus obtain a precise asymptotic description of the risk R as 



ci 



I -Fa 



<7 N. 



when A = 0, 
otherwise. 



(8) 



where again $(•) denotes the standard normal CDF. 

Suppose we have knowledge that when a signal is present, 
its strength is at least equal to Aq. We may then choose a 
threshold T to minimize the maximum risk over all relevant 
scenarios. Specifically, we seek to minimize 



sup R{X,St) 

A6{0}U[Ao,oo) 



(9) 



It is not hard to show that this occurs when i?(0, St) = 
R{\o, St), and hence we conclude from ([Sj that T must solve 



ci 



(10) 



B. Asymptotic Analysis of Minimax Threshold Behavior 

While it is easy to compute the minimax-optimal T in ( fTO] ) 
numerically using bisection, and therefore implement the deci- 
sion rule 5t{(-i) of (|7|, we know of no closed-form expression 
for T. Instead, we now present a brief asymptotic analysis of 
the minimax threshold behavior in order to gain insight as to 
how T behaves as Aq varies. For clarity of presentation and 
without loss of generality, we assume cr-^ = 1 in the sequel. 

Our first observation stems from a comparison of the mean 
standardization quantities in Theorems [T] and [2] It is easily 
verified that fiN,n{X) — > /iAr,n as A — > y/^, implying a need 
for analysis when the minimal assumed signal strength Aq is 
close to y/j. Indeed, for other values of Aq a threshold T 
slightly above fiN.n will yield minimax risk very close to 0, 
since o-jv.n ^ N^^^^ and cr7v,n(Ao) ^ N^^/"^ in this case. 

To study the variation of T with Aq in this regime, we first 
parameterize Aq in h, with the restriction /i > 0, as 



Mh) = y/j + h. 



(11) 



The threshold behavior of interest occurs when h is of size 
iV~^/^, and so we parametrize T in t for this case as 

T{t) = flN^n+tcrN,n- (12) 

We summarize the behavior of t for h near N^^^^ in the 
following two lemmas, whose proofs are given in Appendix [B] 
Lemma 1: Let \o{h) and T{t) be parameterized as in ( [TT| ) 
and ( [T2I ), respectively, and fix ft, = o (A^^^/-^). The behavior 
of t then depends on the ratio of costs ce and ci as follows: 

1) If Ce > (l-F/3(0)) -ci, then 



t = 2(3- 



-1/2 



if ci 



1/6 



^(1-i^MO)) ) (1 + 0(1)). 



2) If Ce < (l-F^(0))-ci, then for = F^\l - ce/cj) , 
we have that 



Cl V ^TT 



1/6 



3) If Ce = (1 - F^{0)) ■ Cl, then t solves 



-1 Ce 



(1 + 0(1)0 



1/6 



■ exp 



ci V /37r Vt^/" + 7"^/^ 
^^2 /y/4_^^-i/4x 1/3 



(l + o(l)). 
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Lemma 2: Suppose instead that h = HqN ^/■^ for some 
constant ho > 0. Then we have the result that 

/ /?l/2f / , , \ 1/6 

c..(l-F,(t))^CE-<i>(^|^(7^/^+7-^/^) 

/3l/2/i3/2 \ 



^(^1/4+^-1/4) 

Moreover, if it is also the case that ce — uj{ci), then 



_ / 8/io Ce 

Y /3(7l/4+ ^-1/4)1/6 d ' 

if instead we have that ce — o(ci), then t ^ (^jj^ log 



2/3 



C. Extension to the General Model Order Selection Problem 

In the above discussion we treated the basic model order 
selection problem of Mr : r = versus Mr : r = 1, in order 
to develop our problem formulation and asymptotic results. 
In practice, of course, techniques are needed to address the 
general model order selection problem of estimating r > 0. 
The main result of this section is that an asymptotically 
minimax-optimal rank selection rule is in fact obtained through 
repeated application of the basic A^q vs. M.i case. 

To develop such a rule and verify its properties, we first 
extend the thresholding approach seen earlier to the case of 
arbitrary r > 0. Rather than specifying only a single cost ce 
for incorrectly excluding a term, we now require a sequence 
{cE(j)}"=i of nonnegative costs, corresponding to exclusion 
of respective signal terms. To this end we define a cumulative 
exclusion cost Ce(-) as 



CE(i), 1 < j < 



(13) 



One possible choice for the sequence {cE(«)}f^i is simply to 
set all exclusion costs to be equal; alternatively, with prior 
knowledge that there are at most rmax signals, one might well 
set CE(i) = for i > rmax- 

In a similar manner, we define a sequence of thresholds 
{T(«)}f^j^ associated to the sequence of ordered sample eigen- 
values with each threshold T{i) determined by an 
inclusion cost c\ and the corresponding exclusion cost C^ii)- 
An ordered rank selection procedure for r is then given by 
Algorithm [T| below, and verified to be asymptotically minimax 
optimal by the theorem that follows. 

Algorithm 1 Minimax-optimal rank selection procedure 

1) Fix a threshold sequence {T(i)}"^j^ via ( [TO] i and pre- 
assigned erroneous inclusion/exclusion costs ci,CE(i); 

2) Form the sample covariance C, set i <— 1, and test: 

while ^^{C) > T{i) do 

i -s— i + 1 
end while 

3) Return i — 1 as the final estimate of rank r. 



Theorem 4 (Minimax Optimality): For fixed, nonnegative 
inclusion cost ci and exclusion costs {Ce(«)}, the rank se- 
lection procedure of Algorithm [T] is asymptotically minimax. 

Proof: Note that the ith iteration of Step 2 in Algorithm [T] 
tests Air : r — i — 1 versus Air : r > i. By (|9]l7 
the worst-case risk occurs when r — ^ oo and each signal 
has strength equal to the minimum assumed strength Aq, in 
which case the cost for excluding them all is given by Ce(«) 
in ( pjj ). Thus the optimal threshold at this stage is given 
by the corresponding T{i) satisfying ( [TO] i. Since Theorem |2] 
proves the asymptotic independence of the r principal sample 
eigenvalues, we conclude in turn that considering only the ith 
eigenvalue £i at the ith iteration incurs no loss in power. ■ 

According to ( [TOj i, knowledge of the noise power cr^ is 
required to determine optimal thresholds. In an adaptive set- 
ting, we note that cr^ may be estimated at time t by way of 
the residual variance from the f{t — 1) signals at time index 
t — 1 |8j. In a non-adaptive setting, Kritchman and Nadler 1 12| 
and Patterson et al. p6) suggest alternative approaches. 



V. Empirical Performance Examples 

Having derived a rank selection rule in the preceding section 
and investigated its theoretical properties, we now provide two 
brief simulation studies designed to demonstrate the empirical 
performance of this procedure. We report on an evaluation 
of Algorithm [T] by way of two simulations adopted from the 
direction-of-arrival estimation setting of [7|, shown in the top 
panels of Fig. [3] The first simulation has signals of different 
strengths appearing and disappearing over time, leading to 
a varying rank, whereas the second simulation comprises a 
constant number of signals. In both settings, the snapshots are 
in n 9 dimensions, and arrival directions vary over time. 

Each simulation features a range of signal strengths, in- 
cluding some so low as to be indistinguishable from the 
background noise. In the first simulation, for example, there 
is one signal below the detection threshold ^ between time 
indices 150 and 400; in the second simulation, the weakest 
signal is always below the detection threshold, implying that 
it will not in general be detected. 

In these performance examples we employed a windowed 
covariance estimate with = 45 observations, and set 
ci = ce(1) = • • • = CE{n), with Ao = ^ + N-'^/^. We 
treat as unknown, and estimate it for every time t as the 
mean of the estimated noise eigenvalues at the previous time 
step t — 1 i8|. The middle panels of Fig. |3] demonstrate the 
corresponding rank estimation results, from which we observe 
good empirical agreement with theoretical predictions; where 
we incur estimation errors, they tend to be as a result of signals 
whose strength falls below the detection limit. The bottom 
panels of Fig. [3] also show the error in squared Frobenius 
norm, ||W^Vl^* — Wa;WjI'|||,, of the corresponding subspace 
estimate Wk = {jili w^i ■ ■ ■ Wk) for k — r and k = f, 
respectively the true and estimated ranks. 

For purposes of comparison, we also show a hypothesis- 
testing-based approach adopted from Kritchman and 
Nadler |12|, based on Tracy- Widom quantiles and set at a 
fixed 0.5% false alarm rate as suggested by those authors. We 
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Fig. 3. Two simulations taken from a di rect ion-of-arrival estimation setting, showing empirical performance of the minimax rank selection rule of Algorithm^ 
and the hypothesis-testing-based rule of fT2\. Signals of the form Co(l, e^'^, . . . , e^**'^) appear and disappear over time t £ [0, 1000], with signal directions 
and strengths shown in the top row. One snapshot is sampled per unit time, and the subspace at time t is estimated using the snapshots for times in the window 
(t — 45, t]. The second row of the plot shows the true rank r as a functiori_of t^along with the estimated rank r according to the method of Algorithm [T] 
The subspace approximation error in squared Frobenius norm, || W^W* — VVfeWj*|||^, is plotted for fe = r, r in the third row. 
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Fig. 4. Simulations demonstrating tliat the performance of tlie rank estimators improves as the snapshot sampling fretjuency increases. With the same signal 
arrival patterns as in Fig. pi we vary the snapshot sampling rate and show how the rank estimation error /o^'^" kC*) ~ ^.^ behaves for the two 
rank estimators r. The poinls show the mean behavior and the error bars show 1 standard deviation, computed from 50 replicates of the experiment. 



note however, that the variance estimation approach of p2| 
does not apply directly in the subspace tracking scenario, as 
it employs all eigenvalues of the sample covariance matrix; 
here we employed an estimate based on the residual from the 
previous time step. We see that in comparison to the proposed 
method of Algorithm [T] this approach tends to provide a 
consistently more conservative estimate of rank, resulting in 
greater overall error in this simulation context. 

In Fig. |4] we show how these rank estimators perform as 
the snapshot sampling frequency per unit time is increased. 
Since the number of sensors is held fixed, this corresponds to 
varying 7; the simulations of Fig.|4]demonstrate that for higher 
sampling frequencies (lower values of 7), the rank estimation 
problem becomes easier and the error decreases. Note in the 
left-hand panel of Fig. |4] however, that because the rank is 
not constant within all time windows, the resultant error will 
not necessarily asymptote to zero. In contrast, that of the 
right-hand panel will eventually reach zero when 7 becomes 
sufficiently small to render the weakest signal detectable (as 
per Theorem |2]i. 

VI. Discussion 

In this article we have presented sample covariance asymp- 
totics stemming from random matrix theory, and have in 
turn brought them to bear on the problem of optimal rank 
estimation. This task poses a classical model order selection 
problem that arises in a variety of important statistical signal 
and array processing systems, yet is addressed relatively 
infrequently in the extant literature. Key to our approach is the 
existence of a phase transition threshold in the context of the 
standard array observation model with additive white Gaussian 
noise, below which eigenvalues and associated eigenvectors 
of the sample covariance fail to provide any information on 
population eigenvalues. Using this and other results, we then 



developed a decision-theoretic rank estimation framework that 
led to a simple ordered selection rule based on thresholding; in 
contrast to competing approaches, this algorithm was shown 
to admit asymptotic minimax optimality and to be free of 
tuning parameters. We concluded with a brief simulation study 
to demonstrate the practical efficacy of our rank selection 
procedure, and plan to address a diverse set of rank estimation 
tasks as part of our future work. 

Appendix A 
Tracy- WiDOM Asymptotics 

A characterization of the tail behavior of F^(s), the distri- 
bution function of the Tracy-Widom law, is required to obtain 
the results of Lemmas [T] and |2] In this appendix we derive the 
asymptotic properties of Fp{s) for /3 = 1, 2 as |s| ^ 00. 

To begin, let q{x) solve the Painleve II equation 

q"{x)^xq{x) + 2q^x), 

with boundary condition q{x) ^ Ai{x) as x 00 and Ai(x) 
the Airy function. Then it follows that 

Fi(s) = exp + s)q^{x)dx^ , 

and 

As x — > 00, the Airy function behaves as Ai(a::) 
^^x^^/^cxp (— Ix'^/^) ; asymptotic properties of g as a; — > 
~oo are studied by Hastings and McLeod |27|, who show 
that in this case, q{x) ^ y/\x\/2. Using these facts, we can 
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compute for s — > cx) the term 



and 



q{x)dx 
1 



F2{s) - exp - 



12 



20f 
1 

1 



^-i/4exp(-|a;3/Mds 



In summary, then, for /? = 1, 2 we have that as s 



exp 



(^-^(x + sf/^ ~ i log(a; + s)^ da; 



^/3(s) ^ exp - — 



24' 



Jo '^n^r 



3/2 



3 X 



1 



log s dx 



while for s ^ cx) we have 



1 



2V7? 

1 



-1/4 

s 'exp 



s '^/'^ exp 



«3/2 



2 s y 4 

exp ( — s^^^x 1 da; 



l-F^(s) 



1 

16^ 



/3/2 



s -3/3/4 exp 



2/3 



.3/2 



(14) 



(15) 



,3/2 



and also 



(a; — s)q^{x)dx 

- ^ l^i^ - «)^"'^' exp ^2; 

(^-^(x + s)3/2-ilog(a; + s)^dx 



a; exp 



X exp 



Appendix B 
MiNiMAX Threshold Asymptotics 

In this appendix we prove Lemmas [T] and |2] Recall that 
by ( [TT| we have the parameterization Ao(/i) = + for 
some fixed h > 0, and we seek asymptotic properties of 
the associated minimax eigenvalue threshold T, parameterized 
according to ( [T2] i as T{t) = iiN,n + taN,n- To derive the 
asymptotic behavior of T for small and large h, we first require 
the asymptotic behaviors of ^lN,n{^/l + h) and cr n ,n{-\/l + h) 
for small h, along with the tail behaviors of $(a;) and Fi){x). 

To this end, it is not hard to show that for small h, 

fJ-N,n{Vl + h) = HN,n + — + 0{h^), 

V7 



4^3/2 



4.^"''^^PV 3 



— ,s-3/2expf-^s3/2 
167r V 3 



Likewise, for s — > — oo we have that 



X exp (-2.1/2^) dx a^_„(^ + h)^ 2r'/Hl'^' + O (^) 

((y/4^^-l/4)/^)2/3 



Since crAr „ = 
we have that 



for h = 0(iV-i/2) 



and 



q{x)dx ' 



{x — s)q^{x)dx • 



|3/2 



T - flN,ni^ + h) 
(^N,n{Jl + h) 



.-1/4 



1/6 



27 7I/4 + ^- 



T7i + O (Vn/^) • 



12 



Now, we must have that as s 00, 

Fi(s) - exp i - - I :^^s""''e~3 



Using the notation a; ^ y to denote x — j/ (1 + o(l)), a 
standard result | |28| is that as a; -> 00, we have 

1 f I 



1 f l_^_3/4^-a.3/2 

2 V2V7r^ 



1 - $(.a;) 



X exp 



1 

16^' 



,-3/2g-|.^V2 



exp 



4^7? 



Therefore, as e ^ 0, \/21ogpi. 

Tail properties of (x) are derived in Appendix |Aj and 
given by ( [T4j i and ([TSj. From these, we have that as e 0, 



l-^s-3/4cxpf--s3/2 
4^/^ 3 



and 



loge-i^ 



2/3 



and similarly 

F^is) ^ 1 
We also get that as s — > — 00, 
Fi{s) ^ exp 



— s-3/2exp(-^s3/2 
IGir V 3 



24 



Equipped with these results, we are now ready to give the 
proofs of Lemmas [T] and [2] 

Proof of Lemma then 

' P^H /7l/4 + ^-l/4X 1/6^ 



= $ 
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When t = O {{h^Ny/^), the left-hand side of ([T0| converges 
to ci(l - ^>(0)). Therefore, 



o 



/3 



Of course, this only makes sense when ce > (1 — -F/sCO)) • q. 
Otherwise, we must have t = a; ((/I'^iV)^/^). In this case, 
using <S>{x) = 1 - (l/\/2^)a;-i exp(-a;V2)(l + 0{x-'^)) as 
X — ?► oo, we obtain for t > the expression 



•i ^ exp 



1/6 



8 



1/3N 



l + 0(%/ft3]V/t3)) 



Consequently, 



• t ^ exp 



1 + 0(7^/^3)) 




l + 0(\/ft3]v)) 



where = Fg ^ ^1 — ^j. Likewise, this expression only 
makes sense when ce < (1 — Ff;{0)) ■ ci. The last case we 
need to consider is when ce = (1 — i^;3(0)) • ci. In this case 
we have that t solves 



t' = iMo) 



-1 Ce 



■ exp 



2_ f h^N \ 'Z*^ 

l/4+^-l/4xl/3\ ^ 

h+0 (V7^3ivjj 



Proof of Lemma |2[ We now suppose instead that h 
h^N^-^/^, for some constant > 0. In this case 

T-/ijv,«(V7 + fe) ^ ^ / 1/4 ^ ^-1/4^ 



/3l/2/,3/2 



72^(^1/4 + 7-1/4) 

If Ce — w(ci), then we must have t — > —00 so that the left- 
hand side of ( fTO] ) converges to cj. Using the tail behavior of 
$(•), we have that 



2Vho 
so that 



Ce 
Ci 



_ / 8/10 Ce 



If, on the other hand, ce = o(ci), then the right-hand side 
of ( [Tol l must converge to ce, and 

2/3 



P 



Cp 

1 - — 

Cl 



3 ci 
2^'°^CE 
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